[WIP] : improve current timing script #114

akshitasure12 · 2025-05-31T18:22:17Z

improves #51

dschult

Any way to show text for both the time speedup and the time itself?
I often care more about speed-ups when the length of time is large -- the current values shown make it hard to see how the length of time changes between 50 and 300 nodes. Should I care that parallel is slower for 50 nodes?

Any ideas about why it would be slower for 50 nodes? It is doing all n_jobs=-1 processors -- what is making it slower?

timing/timing_individual_function.py

akshitasure12 · 2025-06-02T04:30:37Z

Any way to show text for both the time speedup and the time itself?

We could display both of them by making an annotation dataframe that stores formatted strings. Something like this:

display.at[number_of_nodes[num], p] = f"{parallelTime:.2g}s\n{timesFaster:.2g}x"

Another way could be to create two dataframes (one for the parallel run time and another for speed-ups) and overlaying one over the other. For our use case, I think the annotation dataframe would be much simpler.

Should I care that parallel is slower for 50 nodes?

Although the parallel implementation appears to be slower for 50 nodes, the difference in their runtimes is minimal. This is likely due to parallel overhead, which tends to be more noticeable on smaller graphs. Still, the implementation remains efficient overall -- so, it seems reasonable to me to overlook this small dip in small-scale graphs. LMKWYT

akshitasure12 · 2025-06-02T08:05:28Z

timing/heatmap_is_reachable_timing.png

not sure why the speed-up with 500 nodes is less than with 300 nodes. Could you please help me understand this?

How long does it take to run and create this tiing picture? If it isn't very long, try running it a few times and see if the dip in performance between 300 and 500 goes away.

If the results change from run to run, we should try to pick "better" parameters that don't show those slowdowns. Most computers these days have background processes running that can imapct timing scripts. To see through the haze of such noise, we could try multiple runs and take the minimum time from those runs. Another approach is to choose larger problems so the small differences from background processes looks more like noise.

It's also nice to repeatedly double the parameter (or multiply by a constant factor other than 2, like 10). That allows you to see whether the time is increasing like n^2 or n^3. Doubling nodes will take ~8 times longer for O(n^3). Of course, the big-O is only valid in the limit as n-> infinity, so take it all with a grain of salt. But if it is consistently a factor of 8 longer when doubling, you can be fairly constant it is running an O(n^3) code.

In this case, since we don't care so much about graphs with 50 nodes, and we do care about large graphs. We should probably increase the number of nodes. Maybe start with 250 nodes and double it up to 4000 for this picture. Lots of problems should really be run on 10_000-100_000 nodes. But that would take a long time to do the timing... :)

I guess the point of these plots is to show the impact of parallel code. So the best results would show a factor near 2 for njobs=2 and near 5 for njobs=5. And the title or caption should say what njobs was for the experiment.

:)

@dschult Thank you for your insights!

How long does it take to run and create this tiing picture? If it isn't very long, try running it a few times and see if the dip in performance between 300 and 500 goes away.

It didn’t take long to run, so I executed it a few more times to check if the dip was just a one-off. What I observed was that the speedup would initially peak, but then wouldn't yield any speedup for larger number of nodes.

After some debugging, I found the issue was in the is_reachable function, where num_in_chunk was being passed instead of n_jobs:

num_in_chunk = max(len(G) // n_jobs, 1) node_chunks = nxp.chunks(G, num_in_chunk)

needs to be replaced with:

node_chunks = nxp.chunks(G, n_jobs)

This is being addressed in PR#112

After applying that fix, I ran it 5 times and considered the minimum time from these runs for both is_reachable (ref. heatmap) and betweenness_centrality (ref. heatmap). However, I still observe a decline in speedup at higher node counts. Could this be due to system limitations?

System details:

MacBook Pro 13-inch M2
macOS Sequoia 15.5
Apple M2 Chip
Memory 8GB

Yay!! So good to hear you found one fix. Too bad there are still some mysteries.
Let's see -- are we running into memory issues? (probably not.... 1600 nodes -> 2.5M edges for the complete graph (prob=1)... each takes about 32-128 bytes so I think the 8GB memory would be OK unless the data has to be copied many times in the parallel version. I think they share memory, but I'm not sure.

Let's hope it is another bug. One that involves code that affects both of these functions. I guess start in the utility functions that are called in both functions.

akshitasure12 · 2025-06-05T17:38:37Z

Hi @dschult, I had previously added another commit to move numpy to the test dependencies, but I think it might not have been pulled. Would you like me to reapply that?

dschult · 2025-06-06T12:02:47Z

Yes -- I'm very sorry to have lost that commit to the PR. I didn't pull the latest changes when I rebased on the main branch.

It all looks good now that you reapplied that. 🚀

akshitasure12 · 2025-06-08T10:38:20Z

In the recent commit, I tried to implement memmapping in nx.tournament.is_reachable.

I replaced NetworkX's graph object with a NumPy-based adjacency matrix. This is because the problem involved making multiple copies of the input graph for each worker. Joblib.parallel has a special way of handling large numpy arrays by passing them as memory-mapped references to worker processes. This file is then accessed in read-only mode (mmap_mode="r") by each parallel worker utilising joblib's dump and load.

The heatmap obtained gives speedups ranging from 11x to 25x with this change, view heatmap.

Note : the current implementation assumes that the graph nodes are integers — I plan to make this more dynamic soon. One potential drawback is the memory usage in case of sparse graphs.
LMKWYT

dschult · 2025-06-09T13:57:55Z

Interesting that they have special tools for numpy arrays, but not for python objects.
For the nodes, you can map nodes to integers and back with tools like:

nodelist = list(G)  # map of integers to nodes
nodemap = {n: i for i, n in enumerate(nodelist)}

Those are good speeds -- so it looks like memory might be an issue more generally too.

It might be possible to use a numpy matrix in a more memory efficient way than the adjacency matrix.
For example, if the max degree is M, and number of nodes is N, you could make an NxM numpy array and use it as an adjacency list. Make all entries -1 (or another "sentry" value) after the last valid neighbor. So that is NxM instead of NxN.
Then A[n, A[n] != -1] gets the neighbors (there might be a more efficient way to get the neighbors, I'm not sure).

Another approach would be to use a scipy sparse array. So, there are lighter memory ways to handle adjacencies. But let's not worry about that yet.

Do I understand the memmap model? The array is written to a file and then each process has read access to that file. So it uses hard disk space instead of memory to share the data. It looks like there is a Python standard library version of this called mmap. I wonder if that works well with dicts. But numpy is very fast so it might be better to just stick with memmapping.

Very cool!

akshitasure12 · 2025-06-09T19:44:38Z

It might be possible to use a numpy matrix in a more memory efficient way than the adjacency matrix.
For example, if the max degree is M, and number of nodes is N, you could make an NxM numpy array and use it as an adjacency list. Make all entries -1 (or another "sentry" value) after the last valid neighbor. So that is NxM instead of NxN.

I like this idea of using an NxM numpy array as an adjacency list -- would be an efficient way of expressing it. I'll work my way with this. Thanks for the insights :)
Also, I thought it would be a good idea to move this to a separate PR to keep things aligned with the original focus (ref PR#119).

Do I understand the memmap model? The array is written to a file, and then each process has read access to that file. So it uses hard disk space instead of memory to share the data.

Yep, that’s pretty much it!

dschult · 2025-06-09T21:54:08Z

+1 for moving the adjacency stuff to a separate PR. :)

akshitasure12 · 2025-06-20T06:41:43Z

Do we still need timing_comparison.md, or can we delete that as well?

dschult · 2025-07-01T02:56:08Z

I think we can delete that file too.
(It is the output of timing_all_functions.py script right? I can't find any mention of it in the repo.)

This reverts commit 472d395.

timing/timing_all_functions.py

timing/timing_comparison.md

timing/timing_individual_function.py

timing/heatmap_is_reachable_timing.png

timing/timing_individual_function.py

Schefflera-Arboricola · 2025-07-03T14:04:12Z

timing/timing_individual_function.py

+    ax.set_xlabel("Number of Vertices", fontweight="bold", fontsize=12)
+    ax.set_ylabel("Edge Probability", fontweight="bold", fontsize=12)
+
+    n_jobs = nxp.get_n_jobs()


i'd recommend using get_active_backend()[1] instead of get_n_jobs here-- bcoz the first one gives the n_jobs that joblib has stored and used while running the code(and we are using joblib's config system in this script).

hmmm...
It seems confusing that get_n_jobs doesn't get whichever config system's value is being used here. Would that be hard to do? Are there other utility functions like it that we could make work with whichever config system is being used?

i'd recommend using get_active_backend()[1] instead of get_n_jobs here-- bcoz the first one gives the n_jobs that joblib has stored and used while running the code(and we are using joblib's config system in this script).

I'm more aligned with merging this after PR #122, then n_jobs would be set via nx.config.backends.parallel.n_jobs

This reverts commit f45627d.

This reverts commit 924bc27.

Co-authored-by: Aditi Juneja <[email protected]>

dschult

Just leaving my comments in the conversations for this PR. I don't think any of these points are blocking -- we can do them in later PRs too. But we might as well decide here and avoid it later.

add printing of probability of edges along with the number of nodes when running the timing.
move the random seed setting to the beginning of the timing script.

I'll look into whether we can update the config utility functions like get_n_jobs to report based on whichever config system is currently being used.

timing/timing_individual_function.py

dschult · 2025-07-09T19:37:30Z

timing/timing_individual_function.py

+    ax.set_xlabel("Number of Vertices", fontweight="bold", fontsize=12)
+    ax.set_ylabel("Edge Probability", fontweight="bold", fontsize=12)
+
+    n_jobs = nxp.get_n_jobs()


hmmm...
It seems confusing that get_n_jobs doesn't get whichever config system's value is being used here. Would that be hard to do? Are there other utility functions like it that we could make work with whichever config system is being used?

timing/timing_individual_function.py

akshitasure12 changed the title ~~[WIP] : update makefile for running timing scripts~~ [WIP] : improve current timing script May 31, 2025

dschult reviewed Jun 1, 2025

View reviewed changes

timing/timing_individual_function.py Outdated Show resolved Hide resolved

akshitasure12 commented Jun 2, 2025

View reviewed changes

dschult force-pushed the improve-timing branch from a999e8e to 67e3e19 Compare June 4, 2025 21:27

dschult added the type: Enhancement New feature or request label Jun 5, 2025

dschult force-pushed the improve-timing branch 2 times, most recently from d12a986 to 6174eb3 Compare June 5, 2025 17:33

akshitasure12 mentioned this pull request Jun 9, 2025

[ENH]: parallel implementation of is_reachable() with numpy arrays. #119

Open

akshitasure12 marked this pull request as ready for review June 15, 2025 07:26

akshitasure12 added 12 commits July 1, 2025 15:54

updated pyproject.toml

eb7b543

minimal improvements

4e010d7

modularized the script

146f25f

removed function calls

d0935a8

added next func record_result

c85e4c7

switched to enumerate

31b8360

Added number of trials for minimal noise

1b3f2ee

increased the number of nodes to test

c2ed666

setting ticks

7c15124

Move back numpy and scipy to test dependencies

66c234a

working with shared memory

6b666bb

Revert "working with shared memory"

49e35d8

This reverts commit 472d395.

akshitasure12 added 6 commits July 1, 2025 15:58

use timeit.repeat() instead of perf_counter()

1a1e629

simplified bipartite logic

234cc43

optimised the visualization

4ec2ab2

minor edits

b843714

removed timing_all_functions

924bc27

removed timing_comparision.md

f45627d

akshitasure12 force-pushed the improve-timing branch from a82b405 to f45627d Compare July 1, 2025 10:33

Schefflera-Arboricola reviewed Jul 3, 2025

View reviewed changes

akshitasure12 and others added 3 commits July 4, 2025 19:38

Revert "removed timing_comparision.md"

aab4245

This reverts commit f45627d.

Revert "removed timing_all_functions"

79cf08c

This reverts commit 924bc27.

update installation text

3edf1d8

Co-authored-by: Aditi Juneja <[email protected]>

dschult reviewed Jul 9, 2025

View reviewed changes

akshitasure12 added 3 commits July 11, 2025 13:11

final touches

3159bdb

improve legend

d037437

minor edit

f2d9970

Uh oh!

[WIP] : improve current timing script #114

Are you sure you want to change the base?

[WIP] : improve current timing script #114

Uh oh!

Conversation

akshitasure12 commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

akshitasure12 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshitasure12 Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschult Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

akshitasure12 Jun 4, 2025

Choose a reason for hiding this comment

System details:

Uh oh!

dschult Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

akshitasure12 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult commented Jun 6, 2025

Uh oh!

akshitasure12 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult commented Jun 9, 2025

Uh oh!

akshitasure12 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshitasure12 commented Jun 20, 2025

Uh oh!

dschult commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Schefflera-Arboricola Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

dschult Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

akshitasure12 Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschult left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dschult Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

akshitasure12 commented May 31, 2025 •

edited

Loading

akshitasure12 commented Jun 2, 2025 •

edited

Loading

akshitasure12 Jun 2, 2025 •

edited

Loading

akshitasure12 commented Jun 5, 2025 •

edited

Loading

akshitasure12 commented Jun 8, 2025 •

edited

Loading

akshitasure12 commented Jun 9, 2025 •

edited

Loading

dschult commented Jun 9, 2025 •

edited

Loading

akshitasure12 Jul 11, 2025 •

edited

Loading